Add embedding-based detector #2

RobGeada · 2025-01-22T09:54:20Z

Adds a framework for defining detections based on a text-embedding classifier. The default configuration here uses MMLU as the training data for the classification and creates a multi-label text classifier to infer which of the 61 MMLU subjects a particular body of text belongs to. The detector endpoint then accepts the following arguments:

contents: List of texts to classify
allowList: Allowed list of subjects: all inbound texts must belong to at least one of these subjects to avoid flagging the detector
blockList: Blocked list of subjects: all inbounds texts must not belong to any of these subjects to avoid flagging the detector.
threshold: Defines the maximum distance a body of text can be from the subject centroid and still be classified into that subject. The default value is 0.75, while a threshold of >10 will classify every document into every subject. As such, values 0<threshold<1 are recommended.

detectors/embedding_classification/build/Makefile

detectors/embedding_classification/build/train.py

m-misiura · 2025-01-22T15:59:56Z

detectors/embedding_classification/README.md

@@ -0,0 +1,37 @@
+# Embedding Classification Detector
+
+# Setup


Perhaps it could be useful to state that local Python must match up Python in the Containerfile? At present, python 3.9 will be downloaded inside the container, which may warrant upgrading?

detectors/embedding_classification/scheme.py

detectors/embedding_classification/detector.py

christinaexyou · 2025-07-15T19:21:48Z

detectors/embedding_classification/build/dataset_configs/base_dataset_config.py

can this be moved to a shared utils folder so that other detectors that require training can use this class ?

christinaexyou · 2025-07-15T19:53:46Z

detectors/embedding_classification/detector.py

+
+sys.path.insert(0, os.path.abspath(""))
+# from common.scheme import TextDetectionHttpRequest, TextDetectionResponse
+import os


delete duplicate import

RobGeada added 2 commits January 22, 2025 09:46

Add embedding based detector

be65d83

add api description to readme

a9b57c8

m-misiura reviewed Jan 22, 2025

View reviewed changes

christinaexyou reviewed Jul 15, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add embedding-based detector #2

Add embedding-based detector #2

Uh oh!

RobGeada commented Jan 22, 2025 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

m-misiura Jan 22, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

christinaexyou Jul 15, 2025

Uh oh!

christinaexyou Jul 15, 2025

Uh oh!

Uh oh!

Add embedding-based detector #2

Are you sure you want to change the base?

Add embedding-based detector #2

Uh oh!

Conversation

RobGeada commented Jan 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

m-misiura Jan 22, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

christinaexyou Jul 15, 2025

Choose a reason for hiding this comment

Uh oh!

christinaexyou Jul 15, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

RobGeada commented Jan 22, 2025 •

edited

Loading